TRAVEL TO PRIMARY ELECTION FOR 2016 Election contribution in Alabama by Jay Cheong

## [1] "/Users/JAY/Desktop/Udacity/Project4/Project4_R/PROJECT4"
##    cmte_id               cand_id                          cand_nm    
##  Length:23645       P60006111:7550   Cruz, Rafael Edward 'Ted':7550  
##  Class :character   P60007168:5446   Sanders, Bernard         :5446  
##  Mode  :character   P60005915:4372   Carson, Benjamin S.      :4372  
##                     P00003392:3519   Clinton, Hillary Rodham  :3519  
##                     P60006723:1142   Rubio, Marco             :1142  
##                     P40003576: 336   Paul, Rand               : 336  
##                     (Other)  :1280   (Other)                  :1280  
##                        contbr_nm         contbr_city    contbr_st 
##  GUEVARA, MARIETTA S. DR.   :  253   BIRMINGHAM: 3295   AL:23645  
##  GUEVARA, MARIETTA S. S. DR.:  108   HUNTSVILLE: 2219             
##  FORREST, ERIC              :  102   MOBILE    : 1255             
##  CHILDERS, DICEY S. MRS.    :   86   MONTGOMERY: 1056             
##  HAMILTON, LAURA            :   78   MADISON   :  951             
##  MARKY, DONNA               :   78   TUSCALOOSA:  595             
##  (Other)                    :22940   (Other)   :14274             
##    contbr_zip             contbr_employer 
##  Min.   :       71   RETIRED      : 6020  
##  1st Qu.:352166810   NONE         : 1321  
##  Median :357564271   N/A          : 1193  
##  Mean   :343661041   SELF EMPLOYED: 1135  
##  3rd Qu.:361177552   SELF-EMPLOYED:  819  
##  Max.   :369167108   (Other)      :13129  
##  NA's   :3           NA's         :   28  
##                               contbr_occupation contb_receipt_amt
##  RETIRED                               : 6616   Min.   :-5400    
##  NOT EMPLOYED                          : 1696   1st Qu.:   25    
##  HOMEMAKER                             :  641   Median :   40    
##  PHYSICIAN                             :  615   Mean   :  134    
##  INFORMATION REQUESTED PER BEST EFFORTS:  608   3rd Qu.:  100    
##  (Other)                               :13467   Max.   :10800    
##  NA's                                  :    2                    
##   contb_receipt_dt                                  receipt_desc  
##  29-FEB-16:  457                                          :23055  
##  31-MAR-16:  332   REDESIGNATION TO GENERAL               :  169  
##  05-APR-16:  315   REDESIGNATION FROM PRIMARY             :  166  
##  31-DEC-15:  308   Refund                                 :   79  
##  05-MAR-16:  265   REATTRIBUTION / REDESIGNATION REQUESTED:   35  
##  30-APR-16:  256   REATTRIBUTION FROM SPOUSE              :   32  
##  (Other)  :21712   (Other)                                :  109  
##  memo_cd                                 memo_text      form_tp     
##   :22666                                      :17236   SA17A:23064  
##  X:  979   * EARMARKED CONTRIBUTION: SEE BELOW: 5182   SA18 :  502  
##            * HILLARY VICTORY FUND             :  496   SB28A:   79  
##            REDESIGNATION TO GENERAL           :  169                
##            REDESIGNATION FROM PRIMARY         :  166                
##            EARMARKED FROM MAKE DC LISTEN      :   98                
##            (Other)                            :  298                
##     file_num                       tran_id      election_tp  
##  Min.   :1003942   ADA11F117B690406CBE7:    3        :    6  
##  1st Qu.:1056899   C2044921            :    2   G2016:  278  
##  Median :1057795   C2831133            :    2   P2016:23361  
##  Mean   :1059548   C3903522            :    2                
##  3rd Qu.:1066824   C3915448            :    2                
##  Max.   :1074038   C3944632            :    2                
##                    (Other)             :23632                
##     NA           modifiedDate       
##  Mode:logical   Min.   :2014-12-22  
##  NA's:23645     1st Qu.:2015-11-11  
##                 Median :2016-02-04  
##                 Mean   :2016-01-07  
##                 3rd Qu.:2016-03-14  
##                 Max.   :2016-04-30  
## 
23645 people choice recorded for election 23 columns information saved

Univariate Plots Section

The result presented Top five candidates by contribution count. Ted Cruz was the first place and Bernard Sanders was the second place.Clinton Hillary took 4th place
This plot shows how much each candidate were contributed and the order based on the popularity level. Ted took the first place as expected but sanders was second place in popularity but he didn’t get much amount of money in the election even less than hillary, rubio.

Plot is for contributor’s job. what kinds of people supported their candidate. Mostly, retired people supported a lot more than any other employees. it is kind of surprising.

This plot is for a contributor’s name along with the frequency. I just could figure out one contributor, named GUEVARA, supported some candidates a lot more than others.

##  [1]    4    0    1    5   25  133  147  168  143  372  357  521  546  691
## [15]  887 1253 1238 1089 1487 1130 2349 3234 2870 2631 2108  256
## 
##  Shapiro-Wilk normality test
## 
## data:  his_pc$counts
## W = 0.84241, p-value = 0.001017

Tried to use shapiro on the histogram. W = 0.84241, p-value = 0.001017 p-value not distributed well. almost closed to zero.

Adjust several different bins of x scale of date. I could figure out end of each month contribution frequency was getting peak. As is on the first plot, I could know mostly much of the contributed money from end of every month. I am not sure why but I got my paystub almost end of month. So, they have money enough to send their candidate.

Univariate Analysis

What is the structure of your dataset?

There are 23 variables and 23645 objects. some of them are useful data.For example,There are candidate name, contributed cost, receipt date, contributor city, contributor job, contributor company etc.. and those data I mentioned above is useful for analyze it. the others like id, memo was not necessarily included in the analysis.
A lot Contributed >>>>>>>>>>>>>>>>>>>>>>>>>>> A few Contributed
Candidate : Ted cruze, Sanders, Carson, Hillary, Rubio… Stein
Year : 2016,2015
City : Birmingham, huntsville, Mobile, Montgomery….etc

What is/are the main feature(s) of interest in your dataset?

I dont know politics well but just heard that rich people support hillary and the other side supports Sanders. Mostly, I will figure out the relation between cost and other variables such like candidate, contributor, city.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Candidate, Contributor, Cost, City, Receipt date, Contributor company, Contributor Position

Did you create any new variables from existing variables in the dataset?

Yes, I created Modified Date, Year, Month, Day.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I changed several times on the date bin and noticed that end of month normally people contributed a lot of times. It could be the date is the monthly salary date for me so that people may feel send money to their candidate when they have enough money.

Bivariate Plots Section

Picture above is for 2015 and 2016 monthly contribution frequency. 2016 people contributed a lot more than 2015 because the election is coming almost around the corner
##  [1] BUGGAY, DAVID S. MR.        MEISLER, HERBERT A. MR.    
##  [3] HIPP, GEORGE                KYNERD, KEVIN B. MR.       
##  [5] ASH, LANDON EDWIN MR.       BUGGAY, DANETTE            
##  [7] KENDRICK, MICHAEL SCOTT MR. MATHIS, CHAD E. DR.        
##  [9] RANKIN, JOHN P.             SMITH, CHADWICK            
## 5165 Levels: AARON, JOAN AARON, JOAN J. MRS. ... ZWAHLEN, RENE
##  [1] STEWART, JIMMY    SUDDERTH, JOHN    WHITMAN, JOHN    
##  [4] CARRIO, CLARENCE  CAMPBELL, JUDITH  COUSINS, JOHN MR.
##  [7] DARBY, DEANA      JENKINS, KRYSTAL  PRINCE, BONNIE   
## [10] NICHOLS, DAKOTA  
## 5165 Levels: AARON, JOAN AARON, JOAN J. MRS. ... ZWAHLEN, RENE

Changed the raw data to subset of contribution amount more than $0 and removed the refunded amount of money from the data. I wanted to see countribution amount of 10 people from the top and from the bottom. top 10 people are almost 1000~3000 times much more amount of contribution than 10 people from the bottom

Ted cruz was supported the most but average contribution amount is less than $100. on the contrary, Jeb bush was contributed from the 5th on the list but the average amount is around $1,000 and the median value seems like $250. maybe rich people like to support Bush.
Ted Cruze earned the most highest campaign contribution in AL but surprisingly, Hillary Clinton was 4th place on the popularity but she earned the second highest campaign contribution expense in AL. It explains small number of people contributed a lot of money for her. Sanders was No.3 in popularity plot but took over 6th place on the contributed money plot. It meant lots of people supported him but not actually contributed big money to him. So I could expect that might be true that rich people supported hillary and poor people supported Sanders. Trump is out of the rank in both plots

box plot which shows the similar plot as above, median, mean.

High percentage of contribution amount stays below $1,000 and next strip is around $1,000 and small strip around $2,000 Next strip looks like around $3,000 and a few of line around $6,000.

Retired people mostly contributed a lot of money on the election. Clinton and Ted cruz are getting contributed as time goes by. Sanders and Rubio doesn’t seem like contributed steadily.

Daily mean and sum/10 contribution amount. just wanted to try other plots.
nothing special can be noticed. mean value is almost similar of all time but summation of the contribution getting increased. so we could think people are participating more contribution for their candidate.

Popularity and Mean value. Ted Cruze got contribution a lot compared to mean costs even though the amount of cost was small he got lots of contribution. As popularity goes lower, Mean value normally higher
## [1] "Cruz, Rafael Edward 'Ted'" "Sanders, Bernard"         
## [3] "Carson, Benjamin S."       "Clinton, Hillary Rodham"  
## [5] "Rubio, Marco"

As expected earlier, Ted and Benjamin got a lot of money. a few of them are over $10,000. Benjamin is mostly higher than the other four candidate. Other four candidate almost same amount of 75% of offset average

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

candidate and contribution date was kind of interesting. I could track how some candidate popularity changed by time even though it could be indirect. Contribution for Ted Cruze keep increasing on the plot as well as hillary in retired people
also contribution amount and date was interesting. it could be possible to check how much contribution generally supported for a candidate. There were like stripe around below $800, $1,000 and $3,000

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

when I look through the contribution from Retired people. Mean contribution amount fluctuated a lot in the beginning of 2015 and getting converged in the beginning of 2016 even though total amount of contribution increased. people supported small amount of money but more people contributed for a candidate.

What was the strongest relationship you found?

The big cities like birmingham, mobile, montgomey lived a lot more people than other countryside so that made contributed lots of money to a candidate they are supporting. also considering retired people dominated the most of contribution. we could say order people have an more interest on a politics.

Multivariate Plots Section

## [1] Sanders, Bernard          Cruz, Rafael Edward 'Ted'
## [3] Rubio, Marco              Clinton, Hillary Rodham  
## [5] Carson, Benjamin S.      
## 21 Levels: Cruz, Rafael Edward 'Ted' < ... < Stein, Jill

montgomery, Tuscaloosa supported democratic party more than republican and Hillary was supported the most in those cities. In other cities, mostly republican candidates were contributed highly.

In Democrat, Sanders mostly more often contributed than hillary even though the amounts were small. In Republican, Ted cruze dominated the contribution frequency mostly all other cities.

2015, December Birmingham and huntsville contributed more than normal
In 2016, April seems like less than expected. it’s because the data collected in the middle of the month I guess.

Monthly mean value for top five candidates
Benjamin and Sanders steadily got contributed from the beginning before Mar 2016. Rubio Marco high amount of mean amount from Nov 2015 Opposingly Hillary got high amount contribution efore 2015 Nov.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

There are some trend between the date and candidate contribution amount.
Monthly mean value for top five candidates
Benjamin and Sanders steadily got contributed from the beginning before Mar 2016. Rubio Marco high amount of contribution suddenly increased from Nov 2015 Opposingly Hillary got high amount contribution before 2015 Nov and getting small amount of money after then. Between Oct2015 and Dec2015 for these three months. trend changed a lot.

Were there any interesting or surprising interactions between features?

Pupularity and mean contribution amount is not necessarily same. even though Ted highly supported in Alabama. he only got small amount of mean contribution. Bush Jep opposedely supported by small people but average contribution amount is the top. funny thing is as many people support their candidate, normally the mean contribution amount is small. I just curious about the relation ship between popularity and total contribution amount.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

Final Plots and Summary

Description One

Description Two

On the right side of the plot, Benjamin was steadily contributed but there are not big amount of money. He might be supported by blue collar as well as Sanders. As I mentioned before I don’t know politics well. Evertyhing is just as I see in the plot. Otherwise, Clinton and Rubio sometimes received high amount of money might be supported by white collar.

Description Three.

Reflection

Those plots above show that people supported their candidate end of every month when they have enough money I think. I am not sure when retired people got their 401k monthly salary but should be on end of month and the amount of contribution is getting increased when it gets closer to the election. I could see there are some stripes on the contribution amount $3,000, $1,000, $700, $500, $300.
There are contribution trend. It could be one way to think of candidate popularity. Some candidate contribution frequency is getting increased such like Ted Cruz, Hillary, Sanders when it close to the election, some didn’t at some point like Rubio, Benjamin.
Three people left for the finalized election so far as going through the data plot, Hillary, Sanders, Ted Cruz. But in reality, There is Donald Trump and not the Ted Cruz. So, It is not easy to say that the contribution amount not directly connected to the popularity even though Donal Drump is a special case, the richest guy in USA no need contribution. If I could have a chance to analyze the whole USA data, It could be more interesting than just Alabama but still satisfied with the result and it was fun.